Overview

Dataset statistics

Number of variables12
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.5 MiB
Average record size in memory257.8 B

Variable types

Numeric6
Categorical6

Alerts

number_products is highly correlated with exitedHigh correlation
exited is highly correlated with number_productsHigh correlation
df_index is uniformly distributed Uniform
df_index has unique values Unique
tenure has 413 (4.1%) zeros Zeros
balance has 3617 (36.2%) zeros Zeros

Reproduction

Analysis started2022-03-23 12:15:18.116024
Analysis finished2022-03-23 12:23:46.231661
Duration8 minutes and 28.12 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5000.5
Minimum1
Maximum10000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-03-23T13:23:46.335014image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile500.95
Q12500.75
median5000.5
Q37500.25
95-th percentile9500.05
Maximum10000
Range9999
Interquartile range (IQR)4999.5

Descriptive statistics

Standard deviation2886.89568
Coefficient of variation (CV)0.5773214038
Kurtosis-1.2
Mean5000.5
Median Absolute Deviation (MAD)2500
Skewness0
Sum50005000
Variance8334166.667
MonotonicityStrictly increasing
2022-03-23T13:23:46.444478image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
< 0.1%
66711
 
< 0.1%
66641
 
< 0.1%
66651
 
< 0.1%
66661
 
< 0.1%
66671
 
< 0.1%
66681
 
< 0.1%
66691
 
< 0.1%
66701
 
< 0.1%
66721
 
< 0.1%
Other values (9990)9990
99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
ValueCountFrequency (%)
100001
< 0.1%
99991
< 0.1%
99981
< 0.1%
99971
< 0.1%
99961
< 0.1%
99951
< 0.1%
99941
< 0.1%
99931
< 0.1%
99921
< 0.1%
99911
< 0.1%

credit_rating
Real number (ℝ≥0)

Distinct460
Distinct (%)4.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean650.5288
Minimum350
Maximum850
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-03-23T13:23:46.554057image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum350
5-th percentile489
Q1584
median652
Q3718
95-th percentile812
Maximum850
Range500
Interquartile range (IQR)134

Descriptive statistics

Standard deviation96.65329874
Coefficient of variation (CV)0.14857651
Kurtosis-0.4257256848
Mean650.5288
Median Absolute Deviation (MAD)67
Skewness-0.0716066082
Sum6505288
Variance9341.860157
MonotonicityNot monotonic
2022-03-23T13:23:46.660874image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
850233
 
2.3%
67863
 
0.6%
65554
 
0.5%
70553
 
0.5%
66753
 
0.5%
68452
 
0.5%
67050
 
0.5%
65150
 
0.5%
68348
 
0.5%
65248
 
0.5%
Other values (450)9296
93.0%
ValueCountFrequency (%)
3505
0.1%
3511
 
< 0.1%
3581
 
< 0.1%
3591
 
< 0.1%
3631
 
< 0.1%
3651
 
< 0.1%
3671
 
< 0.1%
3731
 
< 0.1%
3762
 
< 0.1%
3821
 
< 0.1%
ValueCountFrequency (%)
850233
2.3%
8498
 
0.1%
8485
 
0.1%
8476
 
0.1%
8465
 
0.1%
8456
 
0.1%
8447
 
0.1%
8432
 
< 0.1%
8427
 
0.1%
84112
 
0.1%

country
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size615.4 KiB
France
5014 
Germany
2509 
Spain
2477 

Length

Max length7
Median length6
Mean length6.0032
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFrance
2nd rowSpain
3rd rowFrance
4th rowFrance
5th rowSpain

Common Values

ValueCountFrequency (%)
France5014
50.1%
Germany2509
25.1%
Spain2477
24.8%

Length

2022-03-23T13:23:46.765592image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-23T13:23:46.825751image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
france5014
50.1%
germany2509
25.1%
spain2477
24.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size604.7 KiB
Male
5457 
Female
4543 

Length

Max length6
Median length4
Mean length4.9086
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowFemale
3rd rowFemale
4th rowFemale
5th rowFemale

Common Values

ValueCountFrequency (%)
Male5457
54.6%
Female4543
45.4%

Length

2022-03-23T13:23:46.887722image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-23T13:23:46.948073image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
male5457
54.6%
female4543
45.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

age
Real number (ℝ≥0)

Distinct150
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean74.4136
Minimum0
Maximum149
Zeros62
Zeros (%)0.6%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-03-23T13:23:47.012171image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7
Q137
median74
Q3112
95-th percentile142
Maximum149
Range149
Interquartile range (IQR)75

Descriptive statistics

Standard deviation43.50270777
Coefficient of variation (CV)0.5846069505
Kurtosis-1.208103612
Mean74.4136
Median Absolute Deviation (MAD)38
Skewness0.01045541717
Sum744136
Variance1892.485584
MonotonicityNot monotonic
2022-03-23T13:23:47.113200image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1783
 
0.8%
13183
 
0.8%
6783
 
0.8%
11383
 
0.8%
2983
 
0.8%
11682
 
0.8%
2481
 
0.8%
181
 
0.8%
14281
 
0.8%
13680
 
0.8%
Other values (140)9180
91.8%
ValueCountFrequency (%)
062
0.6%
181
0.8%
274
0.7%
346
0.5%
463
0.6%
568
0.7%
660
0.6%
777
0.8%
867
0.7%
969
0.7%
ValueCountFrequency (%)
14976
0.8%
14868
0.7%
14765
0.7%
14674
0.7%
14560
0.6%
14458
0.6%
14368
0.7%
14281
0.8%
14174
0.7%
14061
0.6%

tenure
Real number (ℝ≥0)

ZEROS

Distinct11
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.0128
Minimum0
Maximum10
Zeros413
Zeros (%)4.1%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-03-23T13:23:47.205621image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q13
median5
Q37
95-th percentile9
Maximum10
Range10
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.892174377
Coefficient of variation (CV)0.5769578633
Kurtosis-1.165225227
Mean5.0128
Median Absolute Deviation (MAD)2
Skewness0.01099145798
Sum50128
Variance8.364672627
MonotonicityNot monotonic
2022-03-23T13:23:47.276813image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
21048
10.5%
11035
10.3%
71028
10.3%
81025
10.2%
51012
10.1%
31009
10.1%
4989
9.9%
9984
9.8%
6967
9.7%
10490
4.9%
ValueCountFrequency (%)
0413
 
4.1%
11035
10.3%
21048
10.5%
31009
10.1%
4989
9.9%
51012
10.1%
6967
9.7%
71028
10.3%
81025
10.2%
9984
9.8%
ValueCountFrequency (%)
10490
4.9%
9984
9.8%
81025
10.2%
71028
10.3%
6967
9.7%
51012
10.1%
4989
9.9%
31009
10.1%
21048
10.5%
11035
10.3%

balance
Real number (ℝ≥0)

ZEROS

Distinct6382
Distinct (%)63.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean76485.88929
Minimum0
Maximum250898.09
Zeros3617
Zeros (%)36.2%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-03-23T13:23:47.367087image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median97198.54
Q3127644.24
95-th percentile162711.669
Maximum250898.09
Range250898.09
Interquartile range (IQR)127644.24

Descriptive statistics

Standard deviation62397.4052
Coefficient of variation (CV)0.8158028335
Kurtosis-1.489411768
Mean76485.88929
Median Absolute Deviation (MAD)46766.79
Skewness-0.1411087109
Sum764858892.9
Variance3893436176
MonotonicityNot monotonic
2022-03-23T13:23:47.474395image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03617
36.2%
130170.822
 
< 0.1%
105473.742
 
< 0.1%
85304.271
 
< 0.1%
159397.751
 
< 0.1%
144238.71
 
< 0.1%
112262.841
 
< 0.1%
109106.81
 
< 0.1%
142147.321
 
< 0.1%
109109.331
 
< 0.1%
Other values (6372)6372
63.7%
ValueCountFrequency (%)
03617
36.2%
3768.691
 
< 0.1%
12459.191
 
< 0.1%
14262.81
 
< 0.1%
16893.591
 
< 0.1%
23503.311
 
< 0.1%
24043.451
 
< 0.1%
27288.431
 
< 0.1%
27517.151
 
< 0.1%
27755.971
 
< 0.1%
ValueCountFrequency (%)
250898.091
< 0.1%
238387.561
< 0.1%
222267.631
< 0.1%
221532.81
< 0.1%
216109.881
< 0.1%
214346.961
< 0.1%
213146.21
< 0.1%
212778.21
< 0.1%
212696.321
< 0.1%
212692.971
< 0.1%

number_products
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size566.5 KiB
1
5084 
2
4590 
3
 
266
4
 
60

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row3
4th row2
5th row1

Common Values

ValueCountFrequency (%)
15084
50.8%
24590
45.9%
3266
 
2.7%
460
 
0.6%

Length

2022-03-23T13:23:47.567126image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-23T13:23:47.625002image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
15084
50.8%
24590
45.9%
3266
 
2.7%
460
 
0.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

credit_card
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size566.5 KiB
1
7055 
0
2945 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
17055
70.5%
02945
29.4%

Length

2022-03-23T13:23:47.684041image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-23T13:23:47.740015image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
17055
70.5%
02945
29.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

is_active
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size566.5 KiB
1
5151 
0
4849 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
15151
51.5%
04849
48.5%

Length

2022-03-23T13:23:47.790565image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-23T13:23:47.840608image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
15151
51.5%
04849
48.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

estimated_salary
Real number (ℝ≥0)

Distinct9999
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean100090.2399
Minimum11.58
Maximum199992.48
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-03-23T13:23:47.906346image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum11.58
5-th percentile9851.8185
Q151002.11
median100193.915
Q3149388.2475
95-th percentile190155.3755
Maximum199992.48
Range199980.9
Interquartile range (IQR)98386.1375

Descriptive statistics

Standard deviation57510.49282
Coefficient of variation (CV)0.5745864221
Kurtosis-1.181518447
Mean100090.2399
Median Absolute Deviation (MAD)49198.15
Skewness0.002085357662
Sum1000902399
Variance3307456784
MonotonicityNot monotonic
2022-03-23T13:23:48.010541image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24924.922
 
< 0.1%
101348.881
 
< 0.1%
55313.441
 
< 0.1%
72500.681
 
< 0.1%
182692.81
 
< 0.1%
4993.941
 
< 0.1%
124964.821
 
< 0.1%
161971.421
 
< 0.1%
39488.041
 
< 0.1%
187811.711
 
< 0.1%
Other values (9989)9989
99.9%
ValueCountFrequency (%)
11.581
< 0.1%
90.071
< 0.1%
91.751
< 0.1%
96.271
< 0.1%
106.671
< 0.1%
123.071
< 0.1%
142.811
< 0.1%
143.341
< 0.1%
178.191
< 0.1%
216.271
< 0.1%
ValueCountFrequency (%)
199992.481
< 0.1%
199970.741
< 0.1%
199953.331
< 0.1%
199929.171
< 0.1%
199909.321
< 0.1%
199862.751
< 0.1%
199857.471
< 0.1%
199841.321
< 0.1%
199808.11
< 0.1%
199805.631
< 0.1%

exited
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size566.5 KiB
0
7963 
1
2037 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
07963
79.6%
12037
 
20.4%

Length

2022-03-23T13:23:48.111844image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-23T13:23:48.162161image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
07963
79.6%
12037
 
20.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2022-03-23T13:22:48.685355image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:15:21.333386image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:18:51.957151image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:19:49.738448image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:20:48.354802image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:21:51.159729image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:23:45.368107image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:16:43.791114image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:19:49.238683image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:20:47.830067image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:21:50.647669image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:22:48.175685image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:23:45.472637image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:17:09.617039image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:19:49.332977image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:20:47.933453image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:21:50.746837image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:22:48.271912image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:23:45.581201image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:17:35.122539image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:19:49.429361image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:20:48.033646image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:21:50.842918image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:22:48.370605image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:23:45.682945image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:18:00.547656image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:19:49.521077image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:20:48.129897image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:21:50.935149image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:22:48.463400image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:23:45.787309image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:18:25.854735image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:19:49.613111image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:20:48.228073image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:21:51.030171image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-03-23T13:22:48.558887image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-03-23T13:23:48.209707image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-03-23T13:23:48.332988image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-03-23T13:23:48.460407image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-03-23T13:23:48.580924image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-03-23T13:23:48.689920image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-03-23T13:23:45.968350image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-03-23T13:23:46.149016image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexcredit_ratingcountrygenderagetenurebalancenumber_productscredit_cardis_activeestimated_salaryexited
01619FranceFemale11020.00111101348.881
12608SpainFemale38183807.86101112542.580
23502FranceFemale548159660.80310113931.571
34699FranceFemale010.0020093826.630
45850SpainFemale542125510.8211179084.100
56645SpainMale358113755.78210149756.711
67822FranceMale14470.0021110062.800
78376GermanyFemale1274115046.74410119346.881
89501FranceMale964142051.0720174940.500
910684FranceMale962134603.8811171725.730

Last rows

df_indexcredit_ratingcountrygenderagetenurebalancenumber_productscredit_cardis_activeestimated_salaryexited
99909991714GermanyMale114335016.6011053667.080
99919992597FranceFemale65488381.2111069384.711
99929993726SpainMale10420.00110195192.400
99939994644FranceMale1037155060.4111029179.520
99949995800FranceFemale4420.00200167773.550
99959996771FranceMale11950.0021096270.640
99969997516FranceMale831057369.61111101699.770
99979998709FranceFemale3970.0010142085.581
99989999772GermanyMale70375075.3121092888.521
999910000792FranceFemale874130142.7911038190.780